Lace: Non-blocking Split Deque for Work-Stealing
نویسندگان
چکیده
Work-stealing is an efficient method to implement load balancing in fine-grained task parallelism. Typically, concurrent deques are used for this purpose. A disadvantage of many concurrent deques is that they require expensive memory fences for local deque operations. In this paper, we propose a new non-blocking work-stealing deque based on the split task queue. Our design uses a dynamic split point between the shared and the private portions of the deque, and only requires memory fences when shrinking the shared portion. We present Lace, an implementation of work-stealing based on this deque, with an interface similar to the work-stealing library Wool, and an evaluation of Lace based on several common benchmarks. We also implement a recent approach using private deques in Lace. We show that the split deque and the private deque in Lace have similar low overhead and high scalability as Wool.
منابع مشابه
Dynamic Memory ABP Work-Stealing
The non-blocking work-stealing algorithm of Arora, Blumofe, and Plaxton (hencheforth ABP work-stealing) is on its way to becoming the multiprocessor load balancing technology of choice in both Industry and Academia. This highly efficient scheme is based on a collection of array-based deques with low cost synchronization among local and stealing processes. Unfortunately, the algorithm’s synchron...
متن کاملDeque-Free Work-Optimal Parallel STL Algorithms
This paper presents provable work-optimal parallelizations of STL (Standard Template Library) algorithms based on the workstealing technique. Unlike previous approaches where a deque for each processor is typically used to locally store ready tasks and where a processor that runs out of work steals a ready task from the deque of a randomly selected processor, the current paper instead presents ...
متن کاملVerification of a Concurrent Deque Implementation
We prove the correctness of the concurrent deque component of a recent implementation of the work-stealing algorithm. Specifically, we prove that this concurrent deque implementation is synchronizable. Synchronizability is a weaker condition than the more traditional notion of serializability. Our concurrent deque implementation is not serializable, but its synchronizability makes it sufficient...
متن کاملDefining Correctness Conditions for Concurrent Objects in Multicore Architectures
Correctness of concurrent objects is defined in terms of conditions that determine allowable relationships between histories of a concurrent object and those of the corresponding sequential object. Numerous correctness conditions have been proposed over the years, and more have been proposed recently as the algorithms implementing concurrent objects have been adapted to cope with multicore proc...
متن کاملAnalysis of Cilk Scheduler
Lecture Summary 1. The Cilk Scheduler We review the Cilk scheduler. 2. Location of Shallowest Thread We define the depth of a thread and the shallowest thread. Next, We prove that the shallowest thread on a processor is either at the top of a deque or being executed. 3. Critical Threads We construct a computation graph G′ similar to the computation graph G, such that when a thread has no incomp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014